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Abstract 

In this paper, we propose a unified algorithmic framework for solving many known variants of MDS. 
Our algorithm is a simple iterative scheme with guaranteed convergence, and is modular; by changing the 
internals of a single subroutine in the algorithm, we can switch cost functions and target spaces easily. In 
addition to the formal guarantees of convergence, our algorithms are accurate; in most cases, they converge 
to better quality solutions than existing methods, in comparable time. We expect that this framework will be 
y—{ useful for a number of MDS variants that have not yet been studied. 

Our framework extends to embedding high-dimensional points lying on a sphere to points on a lower di- 
CN mensional sphere, preserving geodesic distances. As a compliment to this result, we also extend the Johnson- 

H Lindenstrauss Lemma to this spherical setting, where projecting to a random 0((l/e^)logn)-dimensional 

s — ■ 

^ 1 Introduction 

Multidimensional scaling (MDS) Il23[|10[ l3ll is a w^idely used method for embedding a general distance matrix 
^ 1 into a low dimensional Euclidean space, used both as a preprocessing step for many problems, as well as a 
^ visualization tool in its ovra right. MDS has been studied and used in psychology since the 1930s 11351 1331 [22|1 
O to help visualize and analyze data sets where the only input is a distance matrix. More recently MDS has 
become a standard dimensionality reduction and embedding technique to manage the complexity of dealing 
CN with large high dimensional data sets |]8l |9l [ST] |6l] . 

^ In general, the problem of embedding an arbitrary distance matrix into a fixed dimensional Euclidean space 

^ with minimum error is nonconvex (because of the dimensionality constraint). Thus, in addition to the stan- 
dard formulation [12], many variants of MDS have been proposed, based on changing the underlying error 
function 11351 |8l| . There are also applications where the target space, rather than being a Euclidean space, is 
2 a manifold (e.g. a low dimensional sphere), and various heuristics for MDS in this setting have also been 
proposed 11131 l6f|. 

y—^ Each such variant is typically addressed by a different heuristic, including majorization, the singular value de- 

composition, semidefinite programming, subgradient methods, and standard Lagrange-multipler-based meth- 
ods (in both primal and dual settings). Some of these heuristics are efficient, and others are not; in general, 
every new variant of M DS seems to require different ideas for efficient heuristics. 



X 



1.1 Our Work 

In this paper, we present a unified algorithmic framework for solving many variants of MDS. Our approach is 
based on an iterative local improvement method, and can be summarized as follows: "Pick a point and move 
it so that the cost function is locally optimal. Repeat this process until convergence." The improvement step 
reduces to a well-studied and efficient family of iterative minimization techniques, where the specific algorithm 
depends on the variant of M DS. 

A central result of this paper is a single general convergence result for all variants of MDS that we examine. 
This single result is a direct consequence of the way in which we break down the general problem into an 
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iterative algorithm combined with a point-wise optimization scheme. Our approach is generic, efficient, and 
simple. The high level framework can be written in 10-12 lines of MATLAB code, with individual function- 
specific subroutines needing only a few more lines each. Further, our approach compares well with the best 
methods for all the variants of MDS. In each case our method is consistently either the best performer or is 
close to the best, regardless of the data profile or cost function used, while other approaches have much more 
variable performance. Another useful feature of our method is that it is parameter-free, requiring no tuning 
parameters or Lagrange multipliers in order to perform at its best. 

Spherical MDS. An important application of our approach is the problem of performing spherical MDS. 
Spherical MDS is the problem of embedding a matrix of distances onto a (low-dimensional) sphere. Spheri- 
cal MDS has applications in texture mapping and image analysis [6], and is a generalization of the spherical 
dimensionality reduction problem, where the goal is to map points from a high dimensional sphere onto a low- 
dimensional sphere. This latter problem is closely related to dimensionality reduction for finite dimensional dis- 
tributions. A well-known isometric embedding takes a distribution represented as a point on the d -dimensional 
simplex to the d -dimensional sphere while preserving the Hellinger distance between distributions. A spherical 
dimensionality reduction result is an important step to representing high dimensional distributions in a lower- 
dimensional space of distributions, and will have considerable impact in domains that represent data natively 
as histograms or distributions, such as in document processing ||29[ [T8l [2]] , image analysis 11251 lllll and speech 
recognition [11611 . 

Our above framework applies directly to this setting, where for the local improvement step we adapt a 
technique first developed by Karcher for finding geodesic means on a manifold. In addition, we prove a 
Johnson-Lindenstrauss-type result for the sphere; namely, that n points lying on a d -dimensional sphere can 
be embedded on a 0((l/e^)logn)-dimensional sphere while approximately preserving the geodesic distances 
between pairs of points, that is, no distance changes by more than a relative (1 -I- e)-factor. This latter result 
can be seen as complementary to the local improvement scheme; the formal embedding result guarantees the 
error while being forced to use log n dimensions, while the local improvement strategy generates a mapping 
into any k dimensional hypersphere but provides no formal guarantees on the error. 

Summary of contributions. The main contributions of this paper can be summarized as follows: 

• In Section |4|we present our iterative framework, illustrate how it is applied to specific MDS variants and 
prove a convergence result. 

• In Section [s] we present a comprehensive experimental study that compares our approach to the prior 
best known methods for different MDS variants. 

• In Section |6] we prove a formal dimensionality reduction result that embeds a set of n points on a high- 
dimensional sphere into a sphere of dimension 0((l/e^)logn) while preserving all distances to within 
relative error of (1 -I- e) for any e > 0. 

2 Background and Existing Methods 

Multidimensional scaling is a family of methods for embedding a distance matrix into a low-dimensional Eu- 
clidean space. There is a general taxonomy of MDS methods [10,1 : in this paper we will focus primarily the 
metric and generalized MDS problems. 

The traditional formulation of MDS II23II assumes that the distance matrix D arises from points in some d- 
dimensional Euclidean space. Under this assumption, a simple transformation takes D to a matrix of similarities 
S, where s^j = {x^,Xj). These similarities also arise from many psychology data sets directly [i33l 1351! . The 



2 



problem then reduces to finding a set of points X in /c-dimensional space such that XX^ approximates S. This 
can be done optimally using the top k singular values and vectors from the singular value decomposition of S. 

A more general approach called SMACOF that drops the Euclidean assumption uses a technique known as 
stress majorization ||27[ll2[[T3ll . It has been adapted to many other MDS variants as well including restrictions 
of data to lie on quadratic surfaces and specifically spheres II13II . 

Since the sum-of-squares error metric is sensitive to outliers, Cayton and Dasgupta [8] proposed a robust 
variant based on an £ ^ error metric. They separate the rank and cost constraints, solving the latter using either 
semidefinite programming or a subgradient heuristic, followed by a singular value decomposition to enforce 
the rank constraints. 

Many techniques have been proposed for performing spherical MDS. Among them are majorization methods 
i^30] and SMACOF-Q [13J), a multiresolution approach due to Elad, Keller, and Kimmel [14] and an approach 
based on computing the classical MDS and renormalizing [31 ]. 

Embeddings that guarantee bounded error. A complementary line of work in dimensionality reduction 
fixes an error bound for every pair of distances (rather than computing an average error), and asks for the 
minimum dimension a data set can be embedded in while maintaining this error. The Johnson-Lindenstrauss 
Lemma ||19|| states that any collection of n points in a Euclidean space can be embedded in a 0((l/e^)logn) 
dimensional Euclidean space that preserves all distances within a relative error of e. If the points instead define 
an abstract metric space, then the best possible result is an embedding into 0(log n)-dimensional Euclidean 
space that preserves distances up to a factor of O(logn). An exhaustive survey of the different methods for 
dimensionality reduction is beyond the scope of this paper - the reader is directed to the survey by Indyk and 
Matousek for more information II17II . 

The Johnson-Lindenstrauss Lemma can be extended to data lying on manifolds. Any manifold M with "lin- 
earization dimension" k (a measure of its complexity) can be embedded into a 0((l/e^)fclog(fcn)) dimensional 
space so that all pairwise Euclidean distances between points on M are distorted by at most a relative (1 -I- e)- 
factor im i32[[26ll . A /c-dimensional sphere has linearization dimension 0(fc), so this bound applies directly for 
preserving the chordal (i.e. Euclidean) distance between points on a sphere. The geodesic distance between 
points on a sphere can be interpreted as the angle between the points in radians, and a result by Magen [26'J 
show that 0((l/£^)logn) dimensions preserve angles to within a relative factor of 1 + -/e (which is weaker 
than our result preserving the geodesic distance to within a relative factor of (1 -I- e)). 

3 Definitions 

Let D = (djj) be an n X n matrix representing distances between all pairs of points in a set 7 = {jj, . . .y^}. 
In general, we assume that D is symmetric = dj;, although our method does not formally require this. The 
multidimensional scaling problem takes as input Y, D and k, and asks for a mapping : y — > X from 7 to a set 
of points X in a /c-dimensional space T such that the difference between the original and resulting distances is 
minimized. 

There are many different ways to measure the difference between the sets of distances, and these can be 
captured by the following general function 

C(X, D) = 22 Err(/(Xi, Xj) - d^j) 

i j 

where Err measures the discrepancy between the source and target distances, and / denotes the function that 
measures distance in the target space. 

• T = M*^, Err(5) = x') = ||x — x'\\2: This is a general form of the MDS problem, which we refer to 

as f MDS. 
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• T = R^, Err(5) = \5\,f{^x,x') = \\x - x'\\2: This is a robust variant of MDS called rMDS, first suggested 
by Cayton and Dasgupta [i8|] . 

• T = S^, Err(5) = |5| or 5^, is either chordal (c) or geodesic distance (g) on S*^. We refer to this 
family of problems as {c,g}-{l,2}-sMDS. 

It will be convenient to split the expression into component terms. We define 

Q(X, D, Xi) = Err(/(x;, - dij) 
i 

which allows us to write C(X, D) = C;(X, D, X;). 

Notes. The actual measure studied by Cayton and Dasgupta ^ is not rMDS. It is a variant which takes the 
absolute difference of the squared distance matrices. We call this measure r^MDS. Also, classical MDS does 
not appear in this list since it tries to minimize the error between similarities rather than distances. We refer to 
this measure as cMDS. 

4 Algorithm 

We now present our algorithm PlaceCenter(X,D) that finds a mapping y — > X minimizing C(X,D). For now 
we assume that we are given an initial embedding Xi e R*^ to seed our algorithm. Our experiments indicate 
the SVD-based approach [35j is almost always the optimal way to seed the algorithm, and we use it unless 
specifically indicated otherwise. 

Algorithm 1 PlaceCenter (D) 

Run any MDS strategy to obtain initial seed X. 
repeat 

e^C(X,D) 
for i = 1 to n do 

X; «— Place;(X,D) {this updates X; eX} 
end for 

until [s - C{X,D) < t) {for a fixed threshold t} 
return X 



PlaceCenter operates by employing a technique from the block-relaxation class of heuristics. The cost 
function can be expressed as a sum of costs for each point X;, and so in each step of the inner loop we find 
the best placement for X; while keeping all other points fixed, using the algorithm PLACEj(X,D). A key insight 
driving our approach is that PLACEi(X,D) can be implemented either iteratively or exactly for a wide class of 
distance functions. The process terminates when over all i, invoking PlacE;(X,D) does not reduce the cost 
C(X, D) by more than a threshold t. The algorithm takes O(n^) for each iteration, since PlacE;(X, D) will take 
0(n) time and computing C(X,D) takes O(n^) time. 

4.1 A Geometric Perspective On Place;(X,D) 

The routine PLACEj(X, D) is the heart of our algorithm. This routine finds the optimal placement of a fixed 
point Xi with respect to the cost function Q(X, D, Xj) = Err(/(xj, Xj) — dij). Set = d^j. Then the optimal 
placement of X; is given by the point x* minimizing the function 

gM = ^ Err(/(x, xj) - r^). 
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Figure 1: A geometric interpretation of the error term g(x). 



Note that the terms /(x,Xj) and = are zero, so we can ignore their presence in the summation for ease 
of notation. 

There is a natural geometric interpretation of g(x), illustrated in Figure [ij Consider a sphere around the 
point Xj of radius rj. Let Xj be the point on this sphere that intersects the ray from Xj towards x. Then the 
distance f(x, Xj) = Xj) — rj\. Thus, we can rewrite as 

g(x) = 2Err(/(x,x^.)). 

This function is well-known in combinatorial optimization as the min-sum problem. For Err(5) = 5^, g(x) 
finds the point minimizing the sum-of-squared distances from a collection of fixed points (the 1-mean), which 
is the centroid x* = j^^jXj. For Err(5) = \5\, g(x) finds the 1-median, the point minimizing the sum of 
distances from a collection of fixed points. Although there is no closed form expression for the 1-median, there 
are numerous algorithms for solving this problem both exactly []34ll and approximately []4l]. Methods that 
converge to the global optimum exist for any Err(5) = < 2; it is known that if p is sufficiently larger 

than 2, then convergent methods may not exist [5j. 

While g(x) can be minimized optimally for error functions Err of interest, the location of the points 
depends on the location of the solution x*, which is itself unknown! This motivates an alternating optimization 
procedure, where the current iterate x is used to compute Xj, and then these Xj are used as input to the 
min-sum problem to solve for the next value of x. 

Algorithm 2 PLACEj(X, D) 
repeat 

for j = 1 to n do 

Xj <— intersection of sphere of radius rj around Xj with ray from Xj towards 
end for 

X; <— Recenter({xi, X2, . . . , x„}) 
until — g(x;) < t) {for a fixed threshold t} 
return X; 



4.2 Implementing Recenter 

Up to this point, the description of PlaceCenter and Place has been generic, requiring no specification of 
Err and /. In fact, all the domain-specificity of the method appears in Recenter, which solves the min-sum 
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problem. We now demonstrate how different implementations of Recenter allow us to solve the different 
variants of MDS discussed above. 

4.2.1 The original MDS: fMDS 

Recall from Section [s] that the fMDS problem is defined by Err(5) = 5^ and /(x,x') = ||x — x'\\2. Thus, 
g(.x) = X!; 11^ As mentioned earlier, the minimum of this function is attained at x* = (1/n) Xj. Thus, 

Recenter({xi, X2, . . . ,x„}) merely outputs (1/n) Xj, and takes 0(n) time per invocation. 

4.2.2 Robust MDS: rMDS 

The robust MDS problem rMDS is defined by Err(5) = \5\ and /(x,xO = ||x — x'\\2. Minimizing the resulting 
function g(x) yields the famous Fermat-Weber problem, or the 1-median problem as it is commonly known. An 
exact iterative algorithm for solving this problem was given by Weiszfeld II34II , and works as follows. At each 
step of PLACE; the value X; is updated by 




This algorithm is guaranteed to converge to the optimal solution ||24[ I28II , and in most settings converges 
quadratically [21 J. 

Other norms and distances. If Err(5) = \5\p,1 < p < 2, then an iterative algorithm along the same lines 
as the Weiszfeld algorithm can be used to minimize g(x) optimally [5j. In practice, this is the most interesting 
range of values for p. It is also known that for p sufficiently larger than 2, this iterative scheme may not 
converge. 

We also can tune PlaceCenter to the r^MDS problem (using squared distances) by setting = dfj. 
4.2.3 Spherical MDS 

Spherical M DS poses special challenges for the implementation of Recenter. Firstly, it is no longer obvious 
what the definition of Xj should be, since the "spheres" surrounding points must also lie on the sphere. Secondly, 
consider the case where Err(5) = 5^, and /(x,x') is given by geodesic distance on the sphere. Unlike in the 
case of m}, we no longer can solve for the minimizer of g(x) by computing the centroid of the given points, 
because this centroid will not in general lie on the sphere, and even computing the centroid followed by a 
projection onto the sphere will not guarantee optimality. 

The first problem can be solved easily. Rather than draw spheres around each x^-, we draw geodesic spheres, 
which are the set of points at a fixed geodesic distance from x^. On the sphere, this set of points can be easily 
described as the intersection of an appropriately chosen halfplane with the sphere. Next, instead of computing 
the intersection of this geodesic sphere with the ray from Xj towards the current estimate of X;, we compute 
the intersection with a geodesic ray from x^- towards X;. 

The second problem can be addressed by prior work on computing min-sums on manifolds. Karcher II20II 
proposed an iterative scheme for the geodesic sum-of-squares problem that always converges as long as the 
points do not span the entire sphere. His work extends (for the same functions Err,/) to points defined on 
more general Riemannian manifolds satisfying certain technical conditions. It runs in 0(n) time per iteration. 

For the robust case (Err(5) = |5|), the Karcher scheme no longer works. For this case, we make use of a 
Weiszfeld-like adaption [il5il that again works on general Riemannian manifolds, and on the sphere in particu- 
lar. Like the Weiszfeld scheme, this approach takes 0(n) time per iteration. 
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4.3 Convergence Proofs 

Here we prove that each step of PlaceCenter converges as long as the recursively called procedures reduce 
the relevant cost functions. Convergence is defined with respect to a cost function k, so that an algorithm 
converges if at each step k: decreases until the algorithm terminates. 

Theorem 4.1. If each call to <— PLACEi(X,D) decreases the cost Q(X, D,X;), then PlaceCenter(D) converges 
with respect to C(-,D). 

Proof. Let X <— PlacE;(X,D) result from running an iteration of PLACEj(X,D). Let X = {xj, . . . , X;, 
X;_,_i, . . . , x„}. Then we can argue 

C(X,D) - ax,D) = 2 J]Err(/(x;,Xj) - d^j) - 2^ Err(/(Xi, x,) - d^j) 

;=1 ;=1 

= 2Ci(X,D,Xi)-2Q(X,D,Xj) >0. 

The last line follows because X and X only differ at X; versus X;, and by assumption on PLACEj(X,D), this 
sub-cost function must otherwise decrease. □ 

Theorem 4.2. If each call X; <— Recenter(X) reduces then PlacE;(X, D,Xi) converges with 

respect to Q(X, D, •)• 

Proof. First we can rewrite 

n n n 

Q(X, D, Xi) = Err(/(Xi, x^) - d^j) = ^ Err((/(Xi, x^) + d^j) - d^^j) = ^ Err(/(Xi, x^)). 

j=i j=i j=i 

Since Err(/(xj, Xj)) measures the distance to the sphere o^, choosing x'- to minimize (or decrease) Err(/(x[, Xj)) 
must decrease the sum of distances to each point Xj on each sphere o^-. Now let x'j be the closest point to x^' on 
o.. Err(/(x;,x;.)) < Err(/(x;,x,)) and thus 

n n n 

Q(X, D, xp = Err(/(x;, xp) < J] Err(/(x;, xp) < ^ Err(/(x,, x^) = Q(X, D, x,) 
j=i j=i j=i 

where equality only holds if X; = x'-, in which case the algorithm terminates. □ 

5 Experiments 

In this section we evaluate the performance of PlaceCenter (PC). Since PC generalizes to many different cost 
functions, we compare it with the best knovm algorithm for each cost function, if one exists. For the f M DS 
problem the leading algorithm is SMACOF [13]; for the r^MDS problem the leading algorithm is by Cayton 
and Dasgupta (CD) [8J. We know of no previous scalable algorithm designed for rMDS. We note that the 
Cayton-Dasgupta algorithm REE does not exactly solve the r^MDS problem. Instead, it takes a non-Euclidean 
distance matrix and finds a Euclidean distance matrix that minimizes the error without any rank restrictions. 
Thus, as suggested by the authors [8], to properly compare the algorithms, we let CD refer to running REE 
and then projecting the result to a fc-dimensional subspace using the SVD technique [35j (our plots show this 
projection after each step). With regards to each of these Euclidean measures we compare our algorithm with 
SMACOF and CD. We also compare with the popular SVD-based method II35II . which solves the related cMDS 
problem based on similarities, by seeding all three iterative techniques with the results of the closed-form 
SVD-based solution. 
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Then we consider the family of spherical MDS problems {c,g}-{l,2}-sMDS. We compare against a version 
of SMACOF-Q [13] that is designed for data restricted to a low dimensional sphere, specifically for the c-2- 
SMDS measure. We compare this algorithm to ours under the c-2-SMDS measure (for a fair comparison with 
SMACOF-Q) and under the g-l-SMDS measure which is the most robust to noise. 

The subsections that follow focus on individual cost measures. We then discuss the overall behavior of our 



algorithm in Section 5.5 



Data sets, code, and setup. Test inputs for the algorithms are generated as follows. We start with input 
consisting of a random point set with n = 300 points in R'' for d = 200, with the target space T = with 
k = 10. Many data sets in practice have much larger parameters n and d, but we limit ourselves to this range 
because for larger values CD becomes prohibitively slow. The data is generated to first lie on a fc-dimensional 
subspace, and then (full-dimensional) Poisson noise is applied to all points up to a magnitude of 30% of the 
variation in any dimension. Finally, we construct the Euclidean distance matrix D which is provided as input to 
the algorithms. 

These data sets are Euclidean, but "close" to fc-dimensional. To examine the behavior of the algorithms on 
distance matrices that are non-Euclidean, we generate data as before in a fc-dimensional subspace and generate 
the resulting distance matrix D. Then we perturb a fraction of the elements of D (rather than perturbing the 
points) with Poisson noise. The fraction perturbed varies in the set (2%, 10%, 30%, 90%). 

All algorithms were implemented in MATLAB. For SMACOF, we used the implementation provided by Bron- 
stein [7], and built our ovm implementation of SMACOF-Q around it. For all other algorithms, we used our 
own implementatiorj^ In all cases, we compare performance in terms of the error function Err as a function of 
clock time. 



5.1 The rMDS Problem 



Figure 2(a) shows the cost function Err associated with rMDS plotted with respect to runtime. PlaceCenter 
always reaches the best local minimum, partially because only PlaceCenter can be adjusted for the rMDS 
problem. We also observe that the runtime is comparable to SMACOF and much faster than CD in order to get 
to the same Err value. Although SMACOF initially reaches a smaller cost that PC, it later converges to a larger 
cost because it optimizes a different cost function (fMDS). 



We repeat this experiment in Figure 2(b) [ for different values of k (equal to {2, 20, 50, 150}) to analyze the 



performance as a function of k. Note that PC performs even better for lower k in relation to CD. This is likely 
as a result of CD's reliance on the SVD technique to reduce the dimension. At smaller k, the SVD technique 
has a tougher job to do, and optimizes the wrong metric. Also for k = 150 note that CD oscillates in its cost; 
this is again because the REE part finds a nearby Euclidean distance matrix which may be inherently very high 
dimensional and the SVD projection is very susceptible to changes in this matrix for such large k. We observe 
that SMACOF is the fastest method to reach a low cost, but does not converge to the lowest cost value. The 
reason it achieves a cost close to that of PC is that for this type of data the rMDS and fMDS cost functions are 
fairly similar. 

In Figure |3] we evaluate the effect of changing the amount of noise added to the input distance matrix D, as 
described above. We consider two variants of the CD algorithm, one where it is seeded with an SVD -based seed 
(marked CD-I- SVD) and one where it is seeded with a random projection to a ?c-dimensional subspace (marked 
CD-I-rand). In both cases the plots show the results of the REE algorithm after SVD-type projections back to a 
fc-dimensional space. 

The CD-I-SVD technique consistently behaves poorly and does not improve with further iterations. This 
probably is because the REE component finds the closest Euclidean distance matrix which may correspond 
to points in a much high dimensional space, after which it is difficult for the SVD to help. The CD-j-rand 



^All of our code may be found at http : / / www . cs . Utah . edu/ ~arvind/smds . html 
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Figure 2: (a) rMDS: Typical behavior of PC, CD and SMACOF. (b) rIVIDS: Variation with fc = 2,20,50, 150. 



approach does much better, likely because the random projection initializes the procedure in a reasonably 
low dimensional space so REE can find a relatively low dimension Euclidean distance matrix that is nearby. 
SMACOF is again the fastest algorithm, but with more noise, the difference between fMDS and rMDS is larger, 
and thus SMACOF converges to a configuration with much higher cost than PC. We reiterate that PC consistently 
converges to the lowest cost solution among the different methods, and consistently is either the fastest or is 
comparable to the fastest algorithm. We will see this trend repeated with other cost measures as well. 



5.2 The fMDS Problem 



We next evaluate the algorithms PC, SMACOF, and CD under the fMDS distance measure. The results are very 
similar to the rMDS case except now both SMACOF and PC are optimizing the correct distance measure and 
converge to the same local minimum. SMACOF is still slightly faster that PC, but since they both run very fast, 
the difference is of the order of less than a second even in the very worst part of the cost/time tradeoff curve 



shown in Figure 4(a) Note that CD performs poorly under this cost function here except when k = 50. For 
smaller values of k, the SVD step does not optimize the correct distance and for larger k the REE part is likely 
finding an inherently very high dimensional Euclidean distance matrix, making the SVD projection very noisy. 
For the fMDS measure, SMACOF and PC perform very similarly under different levels of noise, both con- 



verging to similar cost functions with SMACOF running a bit faster, as seen in Figure 4(b) CD consistently runs 
slower and converges to a higher cost solution. 



5.3 The r^MDS Problem 

In this setting we would expect CD to perform consistently as well as PC because both minimize the same cost 
function. However, this is not always the case because CD requires the SVD step to generate a point set in r'^. 
As seen in Figure |5] this becomes a problem when k is small (fc = 2, 10). For medium values of k, CD converges 
slightly faster than PC and sometimes to a slightly lower cost solution, but again for large k (= 150), the REE 
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part has trouble handling the amount of error and the solution cost oscillates. SMACOF is again consistently 
the fastest to converge, but unless k is very large (i.e. k = 150) then it converges to a significantly worse 
solution because the fMDS and r^MDS error functions are different. 



5.4 The Spherical IVIDS Problem 

For the spherical MDS problem we compare PC against SMACOF-Q, an adaptation of SMACOF to restrict data 
points to a low-dimensional sphere, and a technique of Elad, Keller and Kimmel []14ll . It turns out that the Elad 
et.al. approach consistently performs poorly compared to both other techniques, and so we do not display it in 
our reported results. SMACOF-Q basically runs SMACOF on the original data set, but also adds one additional 
point po St the center of the sphere. The distance d^ i between any other point p; and po is set to be 1 thus 
encouraging all other points to be on a sphere, and this constraint is controlled by a weight factor k, a larger k 
implying a stronger emphasis on satisfying this constraint. Since the solution produced via this procedure may 
not lie on the sphere, we normalize all points to the sphere after each step for a fair comparison. 



Here we compare PC against SMACOF-Q in the g-l-SMDS (Figure [6(a)] ) and the c-2-SMDS (Figure [6(b)] ) 
problem. For g-l-SMDS, PC does not converge as quickly as SMACOF-Q with small k, but it reaches a better 
cost value. However, when SMACOF-Q is run with a larger k, then PC runs faster and reaches nearly the 
same cost value. For our input data, the solution has similar g-l-MDS and c-l-MDS cost. When we compare 



SMACOF-Q with PC under c-2-MDS (Figure [6(b)] ) then for an optimal choice of k in SMACOF-Q, both PC and 
SMACOF-Q perform very similarly, converging to the same cost function and in about the same time. But for 
larger choices of k SMACOF-Q does much worse than PC. 

In both cases, it is possible to find a value of k that allows SMACOF-Q to match PC. However, this value 
is different for different settings, and varies from input to input. The key observation here is that since PC is 
parameter-free, it can be run regardless of the choice of input or cost function, and consistently performs well. 



5.5 Summary Of Results 

In summary, here are the main conclusions that can be drawn from this experimental study. Firstly, PC is 
consistently among the top performing methods, regardless of the choice of cost function, the nature of the 
input, or the level of noise in the problem. Occasionally, other methods will converge faster, but will not in 
general return a better quality answer, and different methods have much more variable behavior with changing 
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Figure4: (a) fMDS: Variation with = 2,20,50, 150. (b) fMDS: Variation with noise= 2, 10, 30,90. 
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Figure 5: r^MDS: Variation with /c = 2,20,50, 150. 



inputs and noise levels. 



6 A JL Lemma for Spherical Data 

In this section we present a Johnson-Lindenstrauss-style bound for mapping data from a high dimensional 
sphere to a low-dimensional sphere while preserving the distances to within a multiplicative error of (1 + e). 

Consider a set 7 c S'' c R'^^^ of n points, defining a distance matrix D where the element d; j represents 
the geodesic distance between j; and jj on S*^. We seek an embedding of Y into S'^ that preserves pairwise 
distances as much as possible. For a set y e S'' and a projection n{Y) =X d we say the X has y-distortion 
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Figure 6: (a) g-l-SMDS: Comparing PC with SMACOF-Q for different values of penalty parameter k. (b) 
C-2-SMDS: Comparing PC with SMACOF-Q for different values of penalty parameter k. 



from Y if these exists a constant c such that for all X;,Xj 

(1 - r)f(yi,yj) < cf(xi,xj) < (i + r)f(yi,yjl 

For a subspace H = R*^, let 71^(7) be the projection of Y G R'' onto H and then scaled by d/k. For X G R*^, 
let S{X) be the projection to S*^~^, that is for all x eX, the corresponding point in S(X) is x/||x||. 

When /(yi.jj) = Hj; — yjll, and 7 e R'', then the Johnson-Lindenstrauss (JL) Lemma [19] says that if 
H c R'' is a random fc-dimensional linear subspace with k = 0((l/£^)log(n/5)), then X = 71^(7) has s- 
distortion from Y with probability at least 1 — 5. 

We now present the main result of this section. We note that recent results p~| have shown similar results 
for point on a variety of manifolds (including spheres) where projections preserve Euclidean distances. We 
reiterate that our results extend this to geodesic distances on spheres which can be seen as angle Z^ y between 
the vectors to points x,y e S*^. Another recent result [12611 shows that k = 0((l/e^)log(n/5)) dimensions 
preserves ^/i-distortion in angles, which is weaker than the following result. 

Theorem 6.1. Let Y <z S"^ a R'^+'^, and let H = R''+'^ be a random subspace ofR'^ with k = 0((l/e^)log(n/5)) 
with e e (0, 1/4]. Let f{yi,yj) measure the geodesic distance on S'^ (or S'^ as appropriate). Then Si^Tif^i^Y)) has 
e-distortion from Y with probability at least 1 — 5. 

This implies that if we project n data points that lie on any high-dimensional sphere to a low-dimensional 
sphere s'^ with k ~ logn, then the pairwise distances are each individually preserved. Before we proceed with 
the proof, we require a key technical lemma. 

Lemma 6.1. For s e [0,0.5] and x e [0,0.7], 

(1) sin((l - 2£)x) < (1 - e)sin(x), and 

(2) sin((l + 2e)x) >(1 + e)sm(x). 
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origin 



Figure 7: Illustration of the bounds on when ||yy — y,|| < 1/2. The angle Z„max max is the largest when 
ll^maxji _ jg small as possible (lies on inner circle) and — x™^"]] is as large as possible (on 

the outer edges of the disks of diameter e/8 shifted down from dashed line of length ||y; — jjH- Bounds for 
X™" and X™™ are derived symmetrically. 



Proof. Let gj.(x) = (1 - £)sinx - sin((l - 2e)x). We will show that for x e [0, 1] and e e [0,0.5], gg{x) is 
concave. This implies that it achieves its minimum value at the boundary. Now gg(0) = for all e, and it can be 
easily shown that ge(0.7) > for e e [0, 0.5]. This will therefore imply that g^Cx) > in the specified range. 
It remains to show that ge(x) is concave in [0, 0.7]. 

g^^(x) = (1 - 2e)2 sin((l - 2e)x) - (1 - e) sinx 
< (1 - e)(sin((l - 2f-)x) - sinx) 

which is always negative for e e [0, 0.5] and since sinx is increasing in the range [0, 0.7]. 

This proves the first part of the lemma. For the second part, observe that hgCx) = sin((l+2e)x)— (l+e)5m(x) 
can be rewritten as /Tg(x) = g_e(— x). The rest of the argument follows along the same lines, by showing that 
^^(x) is concave in the desired range using that hg(x) = g"^(—x). 

While the upper bound of 0.7 on x is not tight, it is close. The actual bound (evaluated by direct calculation) 
is slightly over 0.72. □ 



Proof of Theorem 6.1 Let X = 71^(7). We consider two cases, {Short Case) when Hj; — JjW < 1/2 and (Long 
Case) when Hy^ - e (1/2, 2]. 

Short Case: First consider points j;, jj e S'' such that | Ij; — yj 1 1 < 1/2. Note that I Ij; — 1 1 = 2 sini^Zy.y. / 2), 
since = = 1. By JL, we know that there exists a constant c such that 

(l-e/8)||yi-y,-|| <c||x;-x,.|| <(l + £/8)||y;-y^-||. 



= c||x_,-|| = 



We need to compare the angle Z^. with that of Zy. y.. The largest Z^ .f. can be is when c||x; 
(1 — e/8) is as small as possible, and so ||cXj — cXj\ \ = (1 + e/8)||y; — yjl | is as large as possible. See Figure 7 
In this case, we have 

(||cXi|| + ||cx^.||)sin(Z^__^./2) < \\cxi-cxj\\ 

2(l-e/8)sin(Z,_,,y2) < (1 + £/8)||yi - y,|| 
2(1 - e/8) sin(Z,_ /2) < (1 + e/8)2 sin(Z^^ /2) 



l + e/8 
1 - e/8 



sin(Z^,^,^,./2) < ^sin(Z^_ /2), 



which for e < 4 implies 



sin(Z,, /2) < (1 + e/2)sin(Z^^ /2). 



13 



Similarly, we can show when ^Xi,x is as small as possible (when c||X;|| = c||Xj|| = (1 + e) and ||cx; — cXj\ 
(l-^)lly;-y;ll),then 

(1 - s/2)smUy„y./2) < sin(Z,^ /2). 



We can also show (via Lemma 6.1 1 that since Hy^ — yjH < 1/2 implies ^y^^y^ < 0.7 we have 

sin((l - eVy,,y^) < (1 - e/2)sin(Z^^^^p 

and 



Thus, we have 



(1 + e/2) sin(Z^__^.) < sin((l + eVy^,y^l 



sin((l-£)Z3,_^^,/2) < sin(Z,^^,./2) < sin((l + e)Z3,^^^./2) 



(l-e)Z^^,^./2 < Z,^,,./2 < a + eVy,,yj2 



-yi,yj — Xi,Xj — V- ' "^^yi.jj- 

Long Case: For Hy; — yj\\ e (1/2,2], we consider 6 additional points y'^^j' e S'^"'"^ (for h e [1 : 6]) equally 
spaced between y; and yj on the shortest great circle connecting them. Let Y be the set Y plus all added points 
{ylf}h=[i:6]- Note that |7| = O(n^), so by JL we have that 

(1 - s/8)\\yi -yijW < c\\xi-Xij\\ < {1 + s/8)\\yi - ytjll 

For notational convenience let y; = y^f andy^ = y^^y Since for Hy;— yj|| e (1/2,2] then ||y|''|''— y|''|^^'*|| < 1/2, 
for h e [0 : 6]. This follows since the geodesic length of the great circular arc through y; and yj is at most 

71, and n/7 < 1/2. Then the chordal distance for each pair lly^j'' — yl''!^^^!! is upper bounded by the geodesic 
distance. Furthermore, by invoking the short case, for any pair 

(1 - e)Z (h) (h+i) < Z (h) (h+i) < (1 + e)Z (h) (h). 

Then since projections preserve coplanarity (specifically, the points and y\^^ for h e [0 : 7] are coplanar, 

hence and x^^^ for h e [0 : 7] are coplanar), we can add together the bounds on angles which all lie on a 
single great circle. 

6 6 

(1-£)Z < (1 -e) VZ m (h+1) < Y.f,=Q^AK) ih+i) <(l + e)Vz m <min{7r,(l + e)Z } 

h=0 h=0 

(1 - e^/i.^j ^ ^Xi,xj ^ min{7i, (1 + e)Z-y, -j,.}. □ 
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